59 research outputs found

    Mapping the visual magnitude of popular tourist sites in Edinburgh city

    Get PDF
    There is value in being able to automatically measure and visualise the visual magnitude of city sites (monuments and buildings, tourist sites) – for example in urban planning, as an aid to automated way finding, or in augmented reality city guides. Here we present the outputs of an algorithm able to calculate visual magnitude – both as an absolute measure of the façade area, and in terms of a building’s perceived magnitude (its lesser importance with distance). Both metrics influence the photogenic nature of a site. We therefore compared against maps showing the locations from where geo-located FlickR images were taken.  The results accord with the metrics and therefore help disambiguate the meaning  of FlickR tags

    Media Mapping: Using Georeferenced Images and Audio to provide supporting information for the Analysis of Environmental Sensor Datasets

    Get PDF
    Field based environmental monitoring projects often fail to gather supporting temporal information on the surroundings, yet these external factors may play a significant part in understanding variations in the collected datasets. For example when sampling air quality the values may change as a result of a bus passing the sampling point, yet this temporal local information is difficult to capture at a con-sistently high resolution over extended time periods. Here we develop an applica-tion which runs on a mobile phone able to capture visual and audio data with cor-responding time and location details. We also develop a desktop analysis tool which synchronises the display of this dataset with those captured from environ-mental sensors. The result is a tool able to assist researchers in understanding local changes in environmental datasets as a result of changes in the nearby surrounding environment

    Building colour terms: A combined GIS and stereo vision approach to identifying building pixels in images to determine appropriate colour terms

    Get PDF
    Color information is a useful attribute to include in a building’s description to assist the listener in identifying the intended target. Often this information is only available as image data, and not readily accessible for use in constructing referring expressions for verbal communication. The method presented uses a GIS building polygon layer in conjunction with street-level captured imagery to provide a method to automatically filter foreground objects and select pixels which correspond to building fac¸ades. These selected pixels are then used to define the most appropriate color term for the building, and corresponding fuzzy color term histogram. The technique uses a single camera capturing images at a high frame rate, with the baseline distance between frames calculated from a GPS speed log. The expected distance from the camera to the building is measured from the polygon layer and refined from the calculated depth map, after which building pixels are selected. In addition significant foreground planar surfaces between the known road edge and building fac¸ade are identified as possible boundarywalls and hedges. The output is a dataset of the most appropriate color terms for both the building and boundary walls. Initial trials demonstrate the usefulness of the technique in automatically capturing color terms for buildings in urban regions

    SimpleMTOD: A Simple Language Model for Multimodal Task-Oriented Dialogue with Symbolic Scene Representation

    Full text link
    SimpleMTOD is a simple language model which recasts several sub-tasks in multimodal task-oriented dialogues as sequence prediction tasks. SimpleMTOD is built on a large-scale transformer-based auto-regressive architecture, which has already proven to be successful in uni-modal task-oriented dialogues, and effectively leverages transfer learning from pre-trained GPT-2. In-order to capture the semantics of visual scenes, we introduce both local and de-localized tokens for objects within a scene. De-localized tokens represent the type of an object rather than the specific object itself and so possess a consistent meaning across the dataset. SimpleMTOD achieves a state-of-the-art BLEU score (0.327) in the Response Generation sub-task of the SIMMC 2.0 test-std dataset while performing on par in other multimodal sub-tasks: Disambiguation, Coreference Resolution, and Dialog State Tracking. This is despite taking a minimalist approach for extracting visual (and non-visual) information. In addition the model does not rely on task-specific architectural changes such as classification heads

    Identifying related landmark tags in urban scenes using spatial and semantic clustering

    Get PDF
    There is considerable interest in developing landmark saliency models as a basis for describing urban landscapes, and in constructing wayfinding instructions, for text and spoken dialogue based systems. The challenge lies in knowing the truthfulness of such models; is what the model considers salient the same as what is perceived by the user? This paper presents a web based experiment in which users were asked to tag and label the most salient features from urban images for the purposes of navigation and exploration. In order to rank landmark popularity in each scene it was necessary to determine which tags related to the same object (e.g. tags relating to a particular café). Existing clustering techniques did not perform well for this task, and it was therefore necessary to develop a new spatial-semantic clustering method which considered the proximity of nearby tags and the similarity of their label content. The annotation similarity was initially calculated using trigrams in conjunction with a synonym list, generating a set of networks formed from the links between related tags. These networks were used to build related word lists encapsulating conceptual connections (e.g. church tower related to clock) so that during a secondary pass of the data related network segments could be merged. This approach gives interesting insight into the partonomic relationships between the constituent parts of landmarks and the range and frequency of terms used to describe them. The knowledge gained from this will be used to help calibrate a landmark saliency model, and to gain a deeper understanding of the terms typically associated with different types of landmarks

    The REAL corpus: A crowd-sourced Corpus of human generated and evaluated spatial references to real-world urban scenes

    Get PDF
    We present a newly crowd-sourced data set of natural language references to objects anchored in complex urban scenes (In short: The REAL Corpus – Referring Expressions Anchored Language). The REAL corpus contains a collection of images of real-world urban scenes together with verbal descriptions of target objects generated by humans, paired with data on how successful other people were able to identify the same object based on these descriptions. In total, the corpus contains 32 images with on average 27 descriptions per image and 3 verifications for each description. In addition, the corpus is annotated with a variety of linguistically motivated features. The paper highlights issues posed by collecting data using crowd-sourcing with an unrestricted input format, as well as using real-world urban scenes. The corpus will be released via the ELRA repository as part of this submission

    Spatial and Temporal Geovisualisation and Data Mining of Road Traffic Accidents in Christchurch, New Zealand

    Get PDF
    Abstract This paper outlines the development of a method for using Kernel Estimation cluster analysis techniques to automatically identify road traffic accident 'black spots' and 'black areas'. A Novel data-mining approach has been developed -adding to the generic exploratory spatial analysis toolkit. Christchurch, New Zealand, was selected as the study area and data from the LTNZ crash database was used to trial the technique. A GIS and Python scripting was used to implement the solution, combining spatial data for average traffic flows with the recorded accident locations. Kernel Estimation was able to identify the accident clusters, and when used in conjunction with Monte Carlo simulation techniques, was able to identify statistically significant clusters

    Conversational natural language interaction for place-related knowledge acquisition

    Get PDF
    We focus on the problems of using Natural Language inter- action to support pedestrians in their place-related knowledge acquisi- tion. Our case study for this discussion is a smartphone-based Natu- ral Language interface that allows users to acquire spatial and cultural knowledge of a city. The framework consists of a spoken dialogue-based information system and a smartphone client. The system is novel in com- bining geographic information system (GIS) modules such as a visibility engine with a question-answering (QA) system. Users can use the smart- phone client to engage in a variety of interleaved conversations such as navigating from A to B, using the QA functionality to learn more about points of interest (PoI) nearby, and searching for amenities and tourist attractions. This system explores a variety of research questions involving Natural Language interaction for acquisition of knowledge about space and place

    電気自動車用スイッチトリラクタンスモータのトルクリップルと入力電流リップルの抑制技術

    Get PDF
    We present a city navigation and tourist information mobile dialogue app with integrated question-answering (QA) and geographic information system (GIS) modules that helps pedestrian users to navigate in and learn about urban environments. In contrast to existing mobile apps which treat these problems independently, our Android app addresses the problem of navigation and touristic question-answering in an integrated fashion using a shared dialogue context. We evaluated our system in comparison with Samsung S-Voice (which interfaces to Google navigation and Google search) with 17 users and found that users judged our system to be significantly more interesting to interact with and learn from. They also rated our system above Google search (with the Samsung S-Voice interface) for tourist information tasks
    corecore